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Abstract 

Propositional bounded model checking has been 
applied successfully to verify embedded software but 
is limited by the increasing propositional formula size 
and the loss of structure during the translation. These 
limitations can be reduced by encoding word-level 
information in theories richer than propositional logic 
and using SMT solvers for the generated verification 
conditions. Here, we investigate the application of dif- 
ferent SMT solvers to the verification of embedded soft- 
ware written in ANSTC. We have extended the encod- 
ings from previous SMT-based bounded model checkers 
to provide more accurate support for finite variables, 
bit-vector operations, arrays, structures, unions and 
pointers. We have integrated the CVC3, Boolector, and 
Z3 solvers with the CBMC front-end and evaluated 
them using both standard software model checking 
benchmarks and typical embedded applications from 
telecommunications, control systems and medical de- 
vices. The experiments show that our approach can 
analyze larger problems and substantially reduce the 
verification time. 

1. Introduction 

Bounded Model Checking (BMC) based on Boolean 
Satisfiability (SAT) has been introduced as a com- 
plementary technique to Binary Decision Diagrams 
(BDD's) for alleviating the state explosion problem JTJ. 
The basic idea of BMC is to check (the negation 
of) a given property at a given depth: given a tran- 
sition system M, a property cj>, and a bound k, BMC 
unrolls the system k times and translates it into a 
verification condition ip such that ip is satisfiable if 
and only if has a counterexample of depth less 
than or equal to k. Standard SAT checkers can be 
used to check if ip is satisfiable. In order to cope 
with increasing system complexity, SMT (Satisfiability 
Modulo Theories) solvers can be used as back-ends for 



solving the verification conditions generated from the 
BMC instances 0, 0, 0, 0. In SMT, predicates 
from various (decidable) theories are not encoded by 
propositional variables as in SAT, but remain in the 
problem formulation. These theories are handled by 
dedicated decision procedures. Thus, in SMT-based 
BMC, ip is a quantifier-free formula in a decidable 
subset of first-order logic which is then checked for 
satisfiability by an SMT solver. 

In order to reason about embedded software accu- 
rately, an SMT-based BMC must consider a number 
of issues that are not easily mapped into the theories 
supported by SMT solvers. In previous work on SMT- 
based BMC for software 0,0,0 only the theories 
of uninterpreted functions, arrays and linear arithmetic 
were considered, but no encoding was provided for 
ANSTC constructs such as bit operations, floating- 
point arithmetic, pointers (e.g., pointer arithmetic and 
comparisons) and unions. This limits its usefulness 
for analyzing and verifying embedded software written 
in ANSI-C. In addition to that, the SMT-based BMC 
approach proposed by Armando et al. 0, does 
not support the checking of arithmetic overflow and 
does not make use of high-level information to simplify 
the unrolled formula. We address these limitations by 
exploiting the different background theories of SMT 
solvers to build an SMT-based BMC tool that precisely 
translates program expressions into quantifier-free for- 
mulae and applies a set of optimization techniques to 
prevent overburdening the solver. This way we achieve 
significant performance improvements over SAT-based 
BMC and previous work on SMT-Based BMC. 

This work makes two major novel contributions. 
First, we provide details of an accurate translation from 
ANSI-C programs into quantifier-free formulae. Sec- 
ond, we demonstrate that the new approach improves 
the performance of software model checking for a wide 
range of embedded software systems. Additionally, we 
show that our encoding allows us to reason about 
arithmetic overflow and to verify programs that make 



use of bit-level, pointers, unions and floating-point 
arithmetic. We also use three different SMT solvers 
(CVC3, Boolector and Z3) in order to check the 
effectiveness of our encoding techniques. To the best 
of our knowledge, this is the first work that reasons 
accurately about ANSI-C constructs commonly found 
in embedded software and extensively applies SMT 
solvers to check the verification conditions emerg- 
ing from the BMC of embedded software industrial 
applications. We describe the ESW-CBMC tool that 
extends the C Bounded Model Checker (CBMC) 
to support different SMT solvers in the back-end and 
to make use of high-level information to simplify 
and reduce the unrolled formula size. Experimental 
results obtained with ESW-CBMC show that our ap- 
proach scales significantly better than both the CBMC 
model checker Q and SMT-CBMC, a bounded model 
checker for C programs that is based on the SMT 
solvers CVC3 and Yices 0, 0. 

2. Background 

ESW-CBMC uses the front-end of CBMC to gener- 
ate the verification conditions (VCs) for a given ANSI- 
C program. However, instead of passing the VCs to 
a propositional SAT solver, we convert them using 
different background theories and pass them to an SMT 
solver. In this section, we describe the main features 
of CBMC and present the background theories used in 
the rest of the paper. 

2.1. C Bounded Model Checker 

CBMC implements BMC for ANSI-C/C++ pro- 
grams using SAT solvers |7|. It can process C/C++ 
code using the goto-cc tool |8|, which compiles the 
C/C++ code into equivalent GOTO-programs using 
a gcc-compliant style. Alternatively, CBMC uses its 
own, internal parser based on Flex/Bison, to process 
the C/C++ files and to build an abstract syntax tree 
(AST). The typechecker annotates this AST with types 
and generates a symbol table. CBMC's IRep class then 
converts the annotated AST and the C/C++ GOTO- 
programs into an internal, language-independent for- 
mat used by the remaining phase of the front-end. 

CBMC derives the VCs using two recursive func- 
tions that compute the assumptions or constraints 
(i.e., variable assignments) and properties (i.e., safety 
conditions and user-defined assertions). CBMC's VC 
generator (VCG) automatically generates safety condi- 
tions that check for arithmetic overflow and underflow, 
array bounds violations and null-pointer dereferences. 
Both functions accumulate the control flow predicates 



to each program point and use that to guard both the 
constraints and the properties, so that they properly 
reflect the program's semantics. 

Although CBMC implements several state-of-the-art 
techniques for propositional BMC, it still has the fol- 
lowing limitations Q, Q: (i) large data-paths involv- 
ing complex expressions lead to large propositional 
formulae, (ii) high-level information is lost when the 
verification conditions are converted into propositional 
logic, and (iv) size of the encoding increases with the 
size of the arrays used in the program. 

2.2. Satisfiability Modulo Theories 

SMT decides the satisfiability of first-order formulae 
using the combination of different background the- 
ories and thus generalizes propositional satisfiability 
by supporting uninterpreted functions, arithmetic, bit- 
vectors, tuples, arrays, and other decidable first-order 
theories. SMT solvers are decision procedures for 
certain theories: given a decidable theory T and a 
quantifier-free formula ip, they check whether ip is 
satisfiable in T or not, or equivalently, whether Tu{^} 
is satisfiable. Given a set T U {ip} of formulae over T, 
we say that ip is a T-consequence of T, and write 
r |=r ip, if and only if every model of T U T is also a 
model of ip. Checking V \=r 4> can be reduced in the 
usual way to checking the T-satisfiability of ruj-i^}. 

In SMT-based bounded model checking, we unroll 
the transition system M and the property ip (which 
is to be checked in T), yielding Mk and ipk respec- 
tively, and pass these to an SMT solver to check 
Mk |=r i>k Q. The solver will always terminate 
with a satisfiable/unsatisfiable answer. If the answer is 
satisfiable, we have found a violation of the property 
ip. If it is unsatisfiable, the property ip holds in M up 
to the given bound k. 

State-of-the-art SMT solvers support not only the 
combination of different decidable theories, but also 
the integration of SAT solvers in order to speed up the 
performance. Furthermore, they often also integrate a 
simplifier which applies standard algebraic reduction 
rules before bit-blasting (i.e., replacing the word-level 
operators by bit-level circuit equivalents) propositional 
expressions to a SAT solver. Background theories vary 
but the SMT-LIB initiative aims at establishing a 
common standard for the specification of background 
theories J9|. However, most SMT solvers provide 
functions in addition to those specified in the SMT- 
LIB. Therefore, we describe here all the fragments that 
we found in the SMT solvers CVC3, Boolector and 
Z3 for the theory of linear, non-linear, and bit-vector 



arithmetic [10], ifTTl . |[T2l . We summarize the syntax 
of these background theories as follows: 

Fml ::= Fml con Fml \ -<Fml \ Atom 
con ::= A | V | © | : | 
Atom ::= Trm rel Trm \ Id \ true \ false 
rel ::= < | < | > | > | = | + 
Trm ::= Trm op Trm \ Const \ Id \ Extract [i,j] 
| SignExt [k] \ ZeroExt [k] 
| ite (Fml, Trm, Trm) 
op ::= +o,m | -o.u | *o.u | /o | rem 
« | >> | & | | | © | @ 

In this grammar Fml denotes Boolean-valued expres- 
sions, Trm denotes integers, reals, and bit-vectors while 
op denotes binary operators. The semantics of the 
relational operators (i.e., <, <, >, >), the non-linear 
arithmetic operators (i.e., *, /, rem) and the right- 
shift operator (>>) depends on whether the program 
variables are unsigned or signed bit-vectors, integers 
or real numbers. The expression Extract denotes 
bit-vector extraction from bits i down to j to yield a 
new bit-vector of size while @ denotes the con- 

catenation of the given bit-vectors. SignExt [k] extends 
the bit-vector to the signed equivalent bit-vector of size 
w + k, where w is the original width of the bit-vector, 
while ZeroExt [k] extends the bit-vector with zeros to 
the unsigned equivalent bit- vector of size w + k. The 
conditional expression ite (Fml, Trm, Trm) takes as 
first argument a Boolean formula and depending on its 
value, selects either the second or the third argument. 
The indexes o and u in the operators +, — , * and / 
denote predicates that check if the bit-wise addition, 
subtraction, multiplication and division overflow and 
underflow respectively. The operator rem denotes the 
signed or unsigned remainder. 

The array theories of SMT solvers are typically 
based on the two McCarthy axioms l28l . Let a be an 
array, i and j be integers and v be a value. The function 
select(a,i) denotes the value of array a at index i and 
store(a,i,v) denotes an array that is exactly the same as 
array a except that the value at index position i is v (if 
i is within the array bounds). Formally, the functions 
select(a,i) and store(a,i,v) can then be represented by 
the following two axioms JTO), 0T|, lfl2l : 

(i = j:select (store (a, i, v) , j) = v) 
(i ^ j:select (store (a, i, v) ,j) — select (a,j)) 

The first axiom asserts that the value selected at index 
j is the same as the last value stored to the index i, if 
the two indices i and j are equal. The second axiom 
asserts that storing a value to index i, does not change 
the value at index j, if the indices i and j are different. 



Tuples are used to model the ANSTC unions and 
struct datatypes. They provide store and select op- 
erations similar to those in arrays, but working on 
the tuple elements. Hence, the expression select(t,f) 
denotes the field / of tuple t while the expression 
store(t,i,v) denotes that tuple t at field / has the value 
v and all other tuple elements remain the same. 

3. ESW-CBMC 

This section describes the main software compo- 
nents that are integrated into the SMT-based back-end 
of CBMC and the encoding techniques that we used to 
convert the constraints and properties from the ANST 
C embedded software into the background theories of 
the SMT solvers. 

3.1. SMT-based CBMC Back-End 

Figure [T] shows the new back-end of CBMC in order 
to support the SMT solvers CVC3, Boolector and Z3. 
The gray boxes represent the components that we mod- 
ified/included in the back-end of CBMC. We reused the 
front-end completely unchanged, i.e., we process the 
constraints and properties that CBMC's VCG generates 
for the unrolled C program in single static assignment 
(SSA) form. However, we implemented a new pair 
of encoding functions for each supported SMT solver 
and let the user select between them. The selected 
functions are then used to encode the given constraints 
and properties into a global logical context, using the 
background theories supported by the selected SMT 
solver. Finally, we invoke this solver to check the 
satisfiability of the context formula. 




Figure 1. Overview of the SMT-based CBMC 
Back-end. 

Formally, we build two sets of quantifier-free formu- 
lae C (for the constraints) and P (for the properties) 
such that M^ k CA^P if and only if the property P 
holds in the model M up to the bound k. If not, we 
have found a violation of the property P. However, 
this approach can be used only to find violations of the 
property up to the bound k and not to prove properties. 
For software verification, in order to prove properties 



we need to compute the completeness threshold to 
determine the maximum number of loop-iterations 
occurring in the program ll23l . Ifl3l . Worst-case exe- 
cution time (WCET) tools can be used to compute the 
completeness threshold by means of static analysis of 
loop structures. The WCET essentially indicates the 
maximum number of loop-iterations and as a result 
CBMC and ESW-CBMC adopt this approach to com- 
pute the maximum bound of the program. However, 
in practice, complex software programs involve large 
data-paths and complex expressions. Therefore, the 
resulting formulae become harder to solve and require 
substantial amounts of memory to build. Thus, for 
complex software programs, we can only ensure that 
the property holds in M up to a bound k. 

We use the code in Figure [2] as a running example 
to illustrate the process to transform a given ANSTC 
code into SSA form and after that into the quantifier- 
free formulae C and P (as shown in (fTJ and (f2]l). It 
is important to note that the code of Figure |2ja) is a 
syntactically valid C program, but it writes accidentally 
to an address outside the allocated memory region of 
the array a (line 6). Hence, in order to reason about 
this C program, seven VCs are generated as follows: 
the first six VCs check the lower and upper bound of 
array a in lines 4, 6 and 7 respectively and the last VC 
checks the assert macro defined by the user in line 7. 
However, before actually checking the properties, the 
front-end of CBMC performs a set of transformations 
and converts the program into SSA form. As a result, 
the original C program in Figure |2ja) is then converted 
into SSA form that only consists of ;/ instructions, 
assignments and assertions as shown in Figure EJb). 



int main () { 




int a [2] , i , x ; 




if (x==0) 




a[i]=0; 




else 




a[ i +2] = 1; 




assert ( a [ i + 1 ] = = 


= D; 


} 




(a) 


gl == (xl == 0) 




al == (aO WITH 


[i0:=0]) 


a2 == aO 




a3 == (a2 WITH 


[2+iO :=1]) 


a4 == (gl ? al 


: a3) 


tl == (a4[l+i0] 


== 1) 



(b) 

Figure 2. (a) A C program with violated property, 
(b) The C program of (a) in SSA form. 



C 



.91 := (xi = 0) 

A a\ := store(ao, ig, 0) 

A a 2 := a 

A 03 := store(a,2, 2 + io, 1) 
A d4 := ite(gi, a%, 0,3) 



(1) 



P 



io > A i a < 2 

A2 + i >0A2 + i o <2 

Al + «o>0Al + i o <2 
A select{a^in + 1) = 1 



(2) 



From this, we build the constraints and properties 
formulae shown in (Q~|) and (|2]). We use additional 
boolean variables (called definition literals) for each 
clause of the formula P in such a way that the 
definition literal is true iff a given clause of the formula 
P is true. Hence, in the example we add a constraint 
for each clause of P as shown in (01: 

^0 & io > 
h i < 2 



^6 select(a^, io + 1) = 1 

We then rewrite (O as: 

-^P := ^0 V V . . . V ->Z 6 



(3) 



(4) 



It is also important to point out that we simplify 
the formulae C and P by using local and recursive 
transformations in order to remove functionally redun- 
dant expressions and redundant literals. Finally, the 
resulting formula C/V^P is passed to an SMT solver to 
check satisfiability. This is different to the approach by 
Armando et al. Q, who build two sets of quantifier- 
free formulae C and V and check whether C \=t l\V 
using an SMT solver. Moreover, they transform the C 
code into conditional normal form instead of single 
static assignment form as we do in this work. 



As mentioned in Section 12.21 modern SMT solvers 
provide ways to model the program variables as bit- 
vectors or as elements of a numerical domain (e.g., Z, 
W). If the program variables are modelled as bit-vectors 
of fixed size, then the result of the analysis can be 
precise (w.r.t. the ANSI-C semantics) depending on the 
size considered for the bit- vectors. On the other hand, 
if the program variables are modelled as numerical 
values, then the result of the analysis is independent 
from the actual binary representation, but the analysis 
may not be precise when arithmetic expressions are 
involved. For instance, the following formula is valid 



in numerical domains such as Z or K: 

(a > A 6 > 0) : {a + b > 0) 



(5) 



However, it does not hold if a and b are interpreted as 
bit-vectors of fixed-size, due to possible overflow in 
the addition operation (Section 13.31 explains how we 
encode arithmetic overflow). In our benchmarks, we 
noted that the majority of VCs are solved faster if we 
model the basic datatypes as integer and/or real. There- 
fore, we have to trade off speed and accuracy which 
might be two competing goals in formal verification 
using SMT solvers. Speed results from the omission 
of detail in the original C program, whereas accuracy 
results from the inclusion of detail. When encoding the 
constraints and properties of C programs into SMT, we 
allow the verification engineer to decide the way to 
model the basic data types (i.e., as integer/real values 
or as bit-vectors) through a run-time option of ESW- 
CBMC. 

3.2. Code Optimizations 

The ESW-CBMC tool implements some standard 
code optimization techniques such as constant folding 
and forward substitution [14|. We observe that there 
is a representative number of embedded applications 
in which these optimization techniques make a signif- 
icant impact on the performance of the tool. Constant 
folding, which is implemented in the front-end allows 
us to replace arithmetic operations involving constants 
by other constants that represent the result of the oper- 
ation. Figure [3] shows an example of constant folding 
when applied to the cyclic redundancy check algorithm 
extracted from the SNU Real-Time benchmark fl5l . 



for (j =0;j <=255;j++) { 

icrctb[j]=icrcl(j < <8,(uchar ) ) ; 

rc [j ] = (uchar )( it [j&0xF]<<4 | it[j>>4]); 

} 



Figure 3. 
check. 



Code fragment of cyclic redundancy 



The right hand side of the expressions in line 2 
and 3 are replaced by the corresponding constants 
since the value of the variable j and all elements of 
array it (where it is an array of constants) are known 
at verification time. As a result, we can encode the 
expressions in line 2 and 3 by using only the function 
store of the SMT solvers (note that the function icrcl 
receives two arguments and returns another element 
of type unsigned char). We also observed that there 
are several embedded applications that repeat the same 



expression many times at different places. The value 
of the operands in the expression does not change in 
between the two evaluations of that expression and can 
thus be forward substituted. Figure|4]shows an example 
of the forward substitution technique when applied to 
the Fast Fourier Transform algorithm extracted again 
from the SNU Real-Time benchmark fj"5l . 



typedef struct { 

float real , imag ; 
} complex ; 

complex x[1024], *xi; 
for(le=n/2; le >0; le/ = 2) { 

for (j=0; j<le; j ++) { 

for (i=j; i<n; i=i+2*le) { 
xi = x + i ; 



Figure 4. Code fragment of fast fourier transform. 

The right hand side of the assignment in line 10 is 
repeated according to the bound used to model check 
this program. This occurs because the most outer for 
loops (lines 5-14 and lines 7-13) invoke the most inner 
for loop (lines 9-12) n times (where n represents the 
unwinding bound) and the address of the array x also 
does not change inside the loops. For instance, if the 
bound is set to 1024, then the expression x + i that is 
assigned to the xi pointer index is repeated 1024 times 
(note that this expression involves pointer arithmetic). 
As a result, we include all expressions into a cache so 
that when a given expression is processed again in the 
program, we only retrieve it from the cache instead of 
creating a new set of variables. 

3.3. Encodings 

3.3.1. Scalar Data Types. We provide two approaches 
to model unsigned and signed integer data types, either 
as the integers provided by the corresponding SMT- 
lib theories or as bit-vectors, which are encoded using 
a particular bit width such as 32 bits. The relational 
operators (e.g., <, <, >, >), arithmetic operators 
(e.g., +, — , /, *, rem) and right-shift are encoded 
depending on whether the operands are unsigned or 
signed bit-vectors, integer or real numbers. We support 
all type casts, including conversion between integer 
and floating-point types. From the front-end's point of 
view, there are six scalar datatypes: bool, signedbv, 
unsignedbv, fixedbv, floatbv and pointer. At this point 



in time, we only support fixed-point arithmetic (i.e., 
fixedbv) for double and float instead of floating-point 
arithmetic (i.e., floatbv). 

The ANSI-C datatypes int, long int, long long 
int, char are considered as signedbv with differ- 
ent bit width (depending on the machine architec- 
ture) and the unsigned version of these datatypes are 
considered as unsignedbv . The conversions between 
signedbv, unsignedbv and fixedbv are performed using 
the world-level functions Extract [i, j], SignExt[k] 
and ZeroExt [k] (described in Section 12. 2\ . Similarly, 
upon dereferencing, the object that the pointer points 
to is converted using the same word-level functions. 
The datatype bool is converted into signedbv and 
unsignedbv using ite. In addition, signedbv and un- 
signedbv are converted into bool using the operator ^ 
by comparing the variable to be converted with zero. 
Formally, let v be a variable of signed type, k be a 
constant whose value is zero matching the type of v 
and let t be a boolean variable such that t G {0, 1}. 
We then convert v into t as follows: 

t= \t^Jfe-l © 

3.3.2. Arithmetic Overflow and Underflow. Arith- 
metic overflow and underflow are frequent sources 
of bugs in embedded software. ANSTC, like most 
programming languages, provides basic data types that 
have a bounded range defined by the number of bits 
allocated to each of them. Some model checkers (e.g., 
SMT-CBMC, F-Soft and Blast 0, 0, [16)) treat 
program variables either as unbounded integers or they 
do not generate VCs related to arithmetic overflow and 
consequently can produce false positive results when 
a VC cannot violate the boundary condition. In our 
work, we encode VCs related to arithmetic overflow 
and underflow in the following way: On arithmetic 
overflow of unsigned integer types (e.g., unsigned int, 
unsigned long int), the ANSI-C standard requires that 
the result must be encoded as modulo (i.e., r mod 
2 W , where r is the operation that caused overflow 
and w is the width of the resulting type in terms of 
bits) 1U. Hence, the result of this encoding is one 
greater than the largest value that can be represented 
by the resulting type. These semantics can be easily 
encoded using the background theories of the SMT 
solvers. 

On the other hand, on arithmetic overflow of signed 
types (e.g., int, long int), the ANSI-C standard does not 
define any behaviour to detect signed integer overflow 
and it only states that integer division-by-zero must be 
detected. As a result, we consider arithmetic overflow 



on addition, subtraction, multiplication, division and 
negation operations. Formally, let over flow* (x,y) 
denote a literal that is true if and only if the mul- 
tiplication of x and y is over LONG_MAX and let 
under flow* (x,y) denote another literal that is true 
if and only if the multiplication of x and y is under 
LONG_MIN. Let res_op* be a literal that denotes the 
validity of the signed multiplication. Then, we add the 
following constraint: 

res_op* O -> overflow* (x, y) A -^underflow* (x , y) 

The addition, subtraction and division are encoded 
in a similar way and are denoted by overflow^ ', 
underflow + , overflow ~ , underflow^, overflow I . 
However, the function overflow"" (x) takes only one 
argument and returns true if and only if the negation 
of x is outside the interval given by LONG_MIN and 
LONG_MAX. 

3.3.3. Arrays. Arrays are encoded in a straight- 
forward manner using the domain theories, and we 
consider the WITH operator and index operator [] to 
be part of the encoding Q, ifFTl . These operators are 
mapped to the functions store and select of the array 
theory presented in Section 12.21 respectively. For the 
with operator, let a be an array, i be an integer variable, 
and v be an expression with the type of the elements 
in a. The operator with takes a, i, and v and returns an 
array that is exactly the same as array a except that the 
value at index position i is v (if ; is within the array 
bounds). Formally, let a' be a with [i] := v, and j an 
index of a, then: 

«'[?'] = ( •^•" U ' M (7) 
i l T 3 -> a \J\ 

If an array index operation is out of bounds, the 
value of the index operator is a free variable, i.e., it is 
chosen non-deterministically. 

3.3.4. Structures and Unions. Structures and unions 
are encoded by using the theory of tuples in SMT and 
map update and access operations to the functions store 
and select of the tuples theory presented in Section l2~2l 
respectively. As a result, we describe here only the 
encoding process of structures, but unions are encoded 
in a similar way. Let w be a structure type, / be a field 
name of this structure, and v be an expression matching 
the type of the field /. The expression store takes w, 
f, and v and returns a tuple that is exactly the same 
as tuple w except that the value at field / is v and all 
other tuple elements remain the same. Formally, let w' 



be store(w, f, v) and j be a field name of w, then: 

\ 3 ¥= f ~> W -J 

3.3.5. Pointers. The ANSI-C language offers two 
dereferencing operators *p and p [i], where p denotes a 
pointer (or array) and i denotes an integer index. The 
front-end of CBMC removes all pointer dereferences 
bottom-up during the unwinding phase. Therefore, the 
ANSTC pointers are treated as program variables and 
CBMC's VCG generates two properties related to 
pointer safety: (i) check if the pointer points to a 
correct object (represented by SAMEJDBJECT) and 
(ii) check if the pointer is neither NULL nor an invalid 
object (represented by INVALID _POINTER). 

We thus encode pointers using two fields of a tuple. 
Let p denote the tuple which encodes a pointer type 
expression. The first field p.o, encodes the object the 
pointer points to, while the second field p.i, encodes an 
index within that object. It is important to note that in 
our encoding the field p.o is dynamically adjusted in 
order to accommodate the object that the pointer points 
to. This approach is similar to the encoding of CBMC 
into propositional logic, but we use the background 
theories such as tuples and bit-vector arithmetic while 
CBMC encodes them by concatenating the bit-vectors. 

Formally, let p a be a pointer expression that points 
to the object a and pi, be another pointer expression 
that points to the object b. Let l s be a literal and we 
then encode the property SAMEjOBJECT by adding 
the following constraint: 

l s <=> (Pa-O = Pb-0) (9) 

To check invalid pointers, the NULL pointer is 
then encoded with an unique identifier denoted by 
rj and invalid object is denoted by v. Let p denote 
a pointer expression. Hence we encode the property 
INVALID _POINTER by creating a literal k and adding 
the following constraint: 

h «*■ (p.o ^v)h {p.i ± 77) (10) 

It is important to note that in the case that a pointer 
points to single element of a scalar data type (e.g., int, 
char), then p.i consists of only. However, in case of 
an array consisting of elements of a scalar data type, 
p.i is considered to be equal to the array index. As an 
example to explain our encoding, we modified the C 
program of Figure |2a) so that a pointer p points to 
the array a as shown in line 3 of Figure [5] In addition 
to the constraints and properties shown in (Q~|i and (|2]l 
(Section 13. U . the front-end generates one additional 
constraint (i.e., the front-end treats the assignment 



p=a in line 3 as p=&,a[0]) and one additional VC 
(i.e., SAME_OBJECT(p, &a[0])) for the C program 
of Figure [3] The constraint p=8za[0] is encoded as 
follows: the first element of the tuple (p.o) contains 
the array a and the second element (p.i) contains the 
index whose value is equal to 0. In order to check the 
property specified by the assert macro in line 8, we 
first add the value 2 to p.i and then check whether p 
and a point to the same element. As p.i exceeds the 
size of the object stored in p.o, i.e., array a, then the 
VC is violated and thus the assert macro defined in 
line 8 is false. 



1 int main() { 

2 int a [2] , i , x , *p ; 

3 p=a ; 

4 if (x==0) 

a[i]=0; 

6 else 

7 a[i+2]=l; 

s assert (*(p+2)==l); 
» } 



Figure 5. C program with pointer to an array. 

Structures consisting of n fields with scalar data 
types are also manipulated like an array with n ele- 
ments. This means that the front-end of CBMC allows 
us to encode the structures by using the usual update 
and access operations. If the structure contains arrays, 
pointers and scalar data types, then p.i points to 
the object within the structure only. As an example, 
Figure [6] shows a C program that contains a pointer to 
a struct consisting of two fields (an array a of integer 
and a char variable b). In order to reason about this 
C program, the front-end generates the constraints and 
properties and we then encode and pass the resulting 
formulae to the SMT solvers as CA^P (as shown in 
CED and O)- 

As the struct y is declared as global in Figure|6](lines 
1-4), its members must be initialized before performing 
any operation as shown in (fTTt (first line) [6|. The 
assignment p = Szy (line 7 of Figure |6]l is encoded by 
assigning the structure y to the field p\.o and the value 
to the field p%.i. However, the front-end does not 
generate any VC related to pointer safety since there 
is no violation of the pointer p in the C program of 
Figure|6](i.e., the pointer p points to the correct object). 
As a result, the front-end performs static checking and 
does not generate unnecessary VCs. Thus, the pointer 
p represented by the tuple p\ is not used for reasoning 
about this program. 



struct x { 
int a[2]; 
char b; 

} y; 

int main ( void ) { 
struct x *p; 

p=&y ; 

p->a[l]=l; 
p— >b= ' c ' ; 

a s s e r t ( p— >a [ 1 ] = = 1 ) ; 
a s s e r t ( p— >b== ' c ' ) ; 

} 



Figure 6. C program with pointer to a struct. 



C := 



yi := store(store(yo-a, 0, 0), 1, 0) A ya.b = 
A pi.o := y A p\.i := 



A y 2 
A y 3 
A yi 



= store(yi, a, store(y\.a, 0, 0)) 
= store(y 2 , a, store(y 2 -a, 1, 1)) 
= store(y3, b, 99) 



P := 



select(select(y4, a), 1) 
A select(y 4 , b) = 99 



(11) 
(12) 



4. Experimental Evaluation 



The experimental evaluation of our work consists of 
three parts. The first part in Section 14.11 contains the 
results of applying ESW-CBMC to the verification of 
fifteen ANSTC programs using three different SMT 
solvers CVC3, Boolector and Z3. The purpose of 
this first part is thus to identify the most promising 
SMT solver for further development and experiments. 
CVC3, Boolector and Z3 are well suited for the 
purpose that they were written for and our intention 
is to integrate all of them into the back-end of CBMC, 
but firstly we need to prioritize the tasks. The second 
part, described in Section 14.21 contains the results of 
applying ESW-CBMC and SMT-CBMC to the veri- 
fication of the official benchmark of the SMT-CBMC 
model checker. We use the official benchmark, because 
SMT-CBMC does not support some of the ANSI- 
C constructs commonly found in embedded software 
(e.g., bit operations, floating-point arithmetic, pointer 
arithmetic). As a result, the purpose of this second 
part is to evaluate ESW-CBMC's relative performance 
against SMT-CBMC. 

The third part in Section 14.31 contains the experi- 
mental results of applying CBMC and ESW-CBMC to 
the verification of embedded software used in telecom- 
munications, control systems and medical devices. The 
purpose of this third part is to evaluate ESW-CBMC's 
relative performance against CBMC using standard 



embedded software benchmarks. All experiments were 
conducted on an otherwise idle Intel Xeon 5160, 3GHz 
server with 4 GB of RAM running Linux OS. For 
all benchmarks, the time limit has been set to 3600 
seconds for each individual property. All times given 
are wall clock time in seconds as measured by the unix 
time command through a single execution. 

4.1. Comparison of SMT solvers 

As a first step, we analyzed to which extent the 
SMT solvers support the domain theories that are 
required for SMT-based BMC of ANSI-C programs. 
For this purpose, we analyzed the following versions 
of the SMT solvers: CVC3 (1.5), Boolector (1.0) and 
Z3 (2.0). For the theory of linear and non-linear 
arithmetic, Z3 and CVC3 do not support the remainder 
operator, but they allow us to define axioms to support 
it. Currently, Boolector does not support the theory 
of linear and non-linear arithmetic. In the theory of 
bit-vectors, CVC3 does not support the division and 
remainder operators (/, rem) for bit-vectors represent- 
ing signed and unsigned integers. However, in all 
cases, axioms can be specified in order to improve 
the coverage. Z3 and Boolector support all word-level, 
bit-level, relational, arithmetic functions over unsigned 
and signed bit-vectors. In the theories of arrays and 
tuples, the verification problems only involve selecting 
and storing elements from/into arrays and tuples, re- 
spectively, and both domains thus comprise only two 
operations. These operations are fully supported by 
CVC3 and Z3, but Boolector does not support the 
theory of tuples. 

In order to evaluate the SMT solvers, we used 
a number of ANSTC programs taken from standard 
benchmark suites. The results of this first part are 
shown in Table Q] The first seven programs are taken 
from the benchmark suite of the SMT-CBMC model 
checker 0. These programs depend on a positive 
integer N that defines the size of the arrays in the 
programs and/or the number of iterations done by 
the program. Armando et al. already proved that this 
class of programs allows us to assess the scalability of 
the model checking tools on problems of increasing 
complexity The next four programs are taken 
from the SNU Real-Time benchmarks suite [ 15 1. These 
programs implement the insertion sort algorithm, Fi- 
bonacci function, binary search algorithm and the 
least mean-square (LMS) adaptive signal enhancement. 
Program 9 is taken from the MiBench benchmark and 
implements the root computation of cubic equations. 
Program 10 is taken from the CBMC manual Q and 
implements the multiplication of two numbers using 
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Table 1 . Results of the comparison between CVC3, Boolector and Z3. Time-outs are represented with TO 
in the Time column; Examples that exceed available memory are represented with MO in the Time column. 



bit operations. The last two programs are taken from 
the High Level Synthesis benchmarks suite lfl8ll and 
implement the encoder and decoder of the adaptive 
differential pulse code modulation (ADPCM). The C 
programs from 8 to 15 contain typical ANSI-C con- 
structs found in embedded software, i.e., they contain 
linear and non-linear arithmetic and make heavy use 
of bit operations. 

Table [TJ shows the results of the comparison between 
CVC3, Boolector and Z3. The first column #L gives 
the total number of lines of code, the second column 
B gives the unwinding bound while the third column 
=#=P gives the number of properties to be verified for 
each ANSI-C program. Size gives the total number of 
variables that are needed to encode the constraints and 
properties of the ANSI-C programs. Time provides the 
average time in seconds to check all properties of a 
given ANSI-C program and Failed indicates how many 
properties failed during the verification process. Here, 
properties can fail for two reasons: either due to a time 
out (TO) or due to memory out (MO). As we can see 
in Table [TJ Z3 runs slightly faster than Boolector and 
CVC3 except for the ANSI-C programs StrCmp and 
SumArray. As we mentioned previously, the purpose 
of this evaluation is to prioritize the integration of the 
SMT solvers into the back-end of CBMC and not to 
define the best SMT solver. Since Z3 supports most 
of the occurring operations, we chose to continue the 
development with Z3. 
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Table 2. Results of the comparison between 
ESW-CBMC and SMT-CBMC. 



4.2. Comparison to SMT-CBMC 

This subsection describes the evaluation of ESW- 
CBMC against another SMT-based BMC that was 
developed in ||2), (5). In order to carry out this evalu- 
ation, we took the official benchmark of SMT-CBMC 
tool available at (19). SMT-CBMC has been invoked 
by setting manually the file name and the unwinding 
bound (i.e., SMT-CBMC -file name -bound n). 
Furthermore, we used the default solver of SMT- 
CBMC (i.e, CVC3 1.5) against the default solver of 
ESW-CBMC (i.e., Z3 2.0) as well as ESW-CBMC 
connected to CVC3 1.5. Table [2] shows the results of 
this evaluation. 

If CVC3 is used as SMT solver, both tools run out of 



memory (although only after exceeding the time out) 
and fail (due to many dynamic choice points repre- 
sented by *) to analyze BubbleSort and SelectionSort 
for large N (Af=140), and MinMax. This indicates some 
problems in the solver itself, rather than in verification 
tools. In addition, SMT-CBMC runs out of time to 
analyze the program StrCmp. However, if Z3 is used as 
solver for ESW-CBMC, the difference becomes even 
more noticeable and ESW-CBMC outperforms SMT- 
CBMC consistently by a factor of 20-40. 

4.3. Comparison to CBMC 

In order to evaluate ESW-CBMC's relative perfor- 
mance against CBMC, we analyze different bench- 
marks such as SNU Real-Time, PowerStone, NEC and 
NXP |H31. EDI. 12H. E21 The SNU Real-Time bench- 
marks contain ANSTC programs that implement cyclic 
redundancy check, Fast Fourier Transform, LMS adap- 
tive signal enhancement, JPEG, matrix multiplication, 
LU decomposition and root computation of quadratic 
equations. The PowerStone benchmarks contain graph- 
ics applications, ADPCM encoder and decoder, paging 
communication protocols and bit shifting applications. 
The NEC benchmark contains an implementation of 
the Laplace transform. The NXP benchmarks are taken 
from the set-top box of NXP semiconductors that 
is used in high definition internet protocol (IP) and 
hybrid digital TV (DTV) applications. The embedded 
software of this platform relies on the Linux operating 
system and makes use of different applications such 
as (i) LinuxDVB that is responsible for controlling 
the front-end, tuners and multiplexers, (ii) DirectFB 
that provides graphics applications and input device 
handling and ( Hi) ALSA that is used to control the audio 
applications. This platform contains two embedded 
processors that exchange messages via an inter-process 
communication (IPC) mechanism. 

We evaluated CBMC version 2.9 and we invoke 
both tools (i.e., CBMC and ESW-CBMC) by setting 
manually the file name, the unwinding bound and 
the overflow check (i.e., CBMC file — unwind 
n — overflow-check). Table [3] shows the results 
when applying CBMC and ESW-CBMC to the verifi- 
cation of the embedded software benchmarks. 

As we can see in Table[3] CBMC is not able to check 
the programs fft 1 k and Ims due to memory limitations. 
Moreover, CBMC takes considerably more time than 
ESW-CBMC to model check the programs ludcmp, 
qurt and laplace. In addition, ESW-CBMC runs faster 
than CBMC for the programs adpcm, exStbHDMI and 
exStbLED. The only case that CBMC runs faster than 
ESW-CBMC is with the program exStbResolution. For 



the remaining benchmarks, the verification times of 
ESW-CBMC and CBMC are very close. It is important 
to point out that the encoding time of ESW-CBMC, 
for all analyzed programs, is slightly faster than the 
encoding time of CBMC. The results in Table [3] allow 
us to assess quantitatively that ESW-CBMC scales 
significantly better than CBMC for problems that in- 
volve tight interplay between non-linear arithmetic, 
bit operations, pointers and array manipulations. In 
addition, both tools were able to find undiscovered 
bugs related to arithmetic overflow, invalid pointer and 
pointer arithmetic in the programs jfdctint, blit and 
pocsag respectively. 

5. Related Work 

SMT-based BMC is gaining popularity in the for- 
mal verification community due to the advent of 
sophisticated SMT solvers built over efficient SAT 
solvers HO), Oj], 02- Previous work related to SMT- 
based BMC 0,0,0,0 combined decision proce- 
dures for the theories of uninterpreted functions, arrays 
and linear arithmetic only, but did not encode key 
constructs of the ANSI-C programming language such 
as bit operations, floating-point arithmetic and pointers. 
Ganai and Gupta describe a verification framework 
for BMC which extracts high-level design informa- 
tion from an extended finite state machine (EFSM) 
and applies several techniques to simplify the BMC 
problem 0, ||231 . However, the authors flatten the 
structures and arrays into scalar variables in such a 
way that they use only the theory of integer and real 
arithmetic in order to solve the verification problems 
that come out in BMC. 

Armando et al. also propose a BMC approach us- 
ing SMT solvers for C programs 0, 0. However, 
they only make use of linear arithmetic (addition and 
multiplication by constants), arrays, records and bit- 
vectors in order to solve the verification problems. 
As a consequence, their SMT-CBMC prototype does 
not address important constructs of the ANSI-C pro- 
gramming language such as non-linear arithmetic and 
bit-shift operations. Xu proposes the use of SMT- 
based BMC to verify real-time systems by using TCTL 
to specify the properties 0. The author considers 
an informal specification (written in English) of the 
real-time system and then models the variables using 
integers and reals and represents the clock constraints 
using linear arithmetic expressions. 

De Moura et al. present a bounded model checker 
that combines propositional SAT solvers with domain- 
specific theorem pro vers over infinite domains 11241 . 
Differently from other related work, the authors ab- 
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Table 3. Results of the comparison between CBMC and ESW-CBMC 



stract the Boolean formula and then apply a lazy 
approach to refine it in an incremental way. This 
approach is applied to verify timed automata and RTL 
level descriptions. Jackson et al. l25ll discharge several 
verification conditions from programs written in the 
Spark language to the SMT solvers CVC3 and Yices as 
well as to the theorem prover Simplify. The idea of this 
work is to replace the Praxis prover by CVC3, Yices 
and Simplify in order to generate counter-example 
witnesses to verification conditions that are not valid. 
This is an ongoing project and several improvements 
are planned to be integrated into their tool. 

Recently, a number of static checkers have been de- 
veloped in order to trade off scalability and precision. 
Calysto is an efficient static checker that is able to 
verify VCs related to arithmetic overflow, null-pointer 
dereferences and assertions specified by the user |26|. 
The VCs are passed to the SMT solver SPEAR which 
supports boolean logic, bit-vector arithmetic and is 
highly customized for the VCs generated by Calysto. 
However, Calysto does not support float-point opera- 
tions and unsoundly approximates loops by unrolling 
them only once. As a consequence, soundness is re- 
linquished for performance. Saturn is another efficient 



static checker that scales to larger systems, but with 
the drawback of losing precision by supporting only 
the most common integer operators and performing at 
most two unwindings of each loop (27]. 

6. Conclusions 

In this work, we have investigated SMT-based veri- 
fication of ANSTC programs, in particular embedded 
software. We have described a new set of encodings 
that allow us to reason accurately about bit operations, 
unions, float-point arithmetic, pointers and pointer 
arithmetic and we have also improved the performance 
of SMT-based BMC for embedded software by making 
use of high-level information to simplify the unrolled 
formula. Our experiments constitute, to the best of 
our knowledge, the first substantial evaluation of this 
approach over industrial applications. The results show 
that our approach outperforms CBMC |7| and SMT- 
CBMC ID if we consider the verification of embedded 
software. SMT-CBMC still has limitations not only 
in the verification time (due to the lack of simplifi- 
cation based on high-level information), but also in 
the encodings of important ANSTC constructs used 



in embedded software. CBMC is a bounded model 
checker for full ANSI-C, but it has limitations due 
to the fact that the size of the propositional formulae 
increases significantly in the presence of large data- 
paths and high-level information is lost when the verifi- 
cation conditions are converted into propositional logic 
(preventing potential optimizations to reduce the state 
space to be explored). For future work, we intend to 
investigate the application of termination analysis ||29ll 
and incorporate reduction methods to simplify the k- 
model. 
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